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Working memory is important for online language processing during conversation. We 
use it to maintain relevant information, to inhibit or ignore irrelevant information, and 
to attend to conversation selectively. Working memory helps us to keep track of and 
actively participate in conversation, including taking turns and following the gist. This paper 
examines the Ease of Language Understanding model (i.e., the ELU model, Ronnberg, 
2003; Ronnberg et al., 2008) in light of new behavioral and neural findings concerning the 
role of working memory capacity (WMC) in uni-modal and bimodal language processing. 
The new ELU model is a meaning prediction system that depends on phonological and 
semantic interactions in rapid implicit and slower explicit processing mechanisms that 
both depend on WMC albeit in different ways. It is based on findings that address the 
relationship between WMC and (a) early attention processes in listening to speech, (b) 
signal processing in hearing aids and its effects on short-term memory, (c) inhibition of 
speech maskers and its effect on episodic long-term memory, (d) the effects of hearing 
impairment on episodic and semantic long-term memory, and finally, (e) listening effort. 
New predictions and clinical implications are outlined. Comparisons with other WMC and 
speech perception models are made. 

Keywords: working memory capacity, speech in noise, attention, long-term memory, hearing loss, brain imaging 
analysis, oscillations, language understanding 



BACKGROUND 
OVERVIEW 

Some 30 years ago, we began a program of research to investi- 
gate the factors related to individual differences in speechreaders' 
ability to understand language. The findings underscored the 
importance of working memory capacity (WMC) for explaining 
those individual differences. In subsequent research, we extended 
our investigations to examine the associations between WMC and 
language understanding in other conditions, with the most recent 
focusing on audio-only speech understanding in adverse listening 
conditions by listeners using hearing aids. 

The Ease of Language Understanding model (i.e., the ELU 
model, Ronnberg, 2003; Ronnberg et al., 2008) was developed, 
tested, and refined in an attempt to specify the role of work- 
ing memory (WM) in a wide range of conditions in which 
people with normal or impaired hearing understand language. 
The language signal may be uni-modal or bi-modal speech or 
sign language and background conditions are realistic but pro- 
vide contextual support or environmental challenge to differing 
degrees. 



SPEECHREADING AS COMPENSATION FOR HEARING LOSS 

Hearing loss leads to poorer perception of auditory speech sig- 
nals and greater reliance on visual information available from the 
talker's face. Thus, we hypothesized, initially, that daily practice 
in visual speechreading by individuals with profound hearing loss 
or deafness would lead to superior, compensatory speechreading 
or speech understanding skills in comparison to normally hear- 
ing peers. One of the findings that motivated this hypothesis was 
that visual speechreading ability varies enormously between indi- 
viduals (see Ronnberg, 1995 for a review). To test this hypothesis, 
we conducted several studies of speechreading in well-matched 
groups of individuals with normal hearing, moderate hearing 
loss and profound hearing loss. Contrary to the prediction, there 
were no significant group differences and thus no evidence of 
compensation for hearing loss by better use of visual speech 
information. Results were similar irrespective of type of presen- 
tation (video vs. real-life audiovisual; Ronnberg et al., 1983), 
type of materials (digits vs. discourse; Ronnberg et al., 1983), 
for just-follow conversation tasks (Hygge et al., 1992), for differ- 
ent durations of impairment, and for different degrees of hearing 
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loss (e.g., Lyxell and Ronnberg, 1989; Ronnberg, 1990; Ronnberg 
et al., 1982, 1983). Spontaneous compensation for hearing loss 
through speechreading seemed, therefore, to be a cherished myth 
(Ronnberg, 1995). 

PERCEPTUAL AND COGNITIVE SKILLS 

These data prompted us to look for other ways to try to explain 
at least parts of the large variability in speech understand- 
ing observed across individuals (Ronnberg et al, 1998a, for an 
overview). In a set of studies, we identified the following pre- 
dictor variables: verbal inference-making (sentence completion, 
Lyxell and Ronnberg, 1987, 1989), context-free word decoding 
(Lyxell and Ronnberg, 1991), and information processing speed 
that relies on semantic long-term memory (LTM; e.g., lexical 
access speed, Ronnberg, 1990; as well as rhyme decision speed; 
Lyxell et al, 1994; Ronnberg et al., 1998b). Indirect predictors 
of sentence-based speechreading performance included the VN 
130/P200 peak-to-peak amplitude measure in the visual evoked 
potential (Ronnberg et al., 1989); WMC measured by the reading 
span test (Lyxell and Ronnberg, 1989; Pichora-Fuller, 1996); and 
verbal ability (Lyxell and Ronnberg, 1992). Overall, the indirect 
predictors were found to be related to sentence-based speechread- 
ing via their relationships with the direct predictors. This set 
of results demonstrated that WMC is strongly related to verbal 
inference-making, which in its turn is related to speechreading 
skill (Lyxell and Ronnberg, 1989); the amplitude of the visual 
evoked potential is related to speechreading via word decoding 
(Ronnberg et al., 1989); and verbal ability is related to speechread- 
ing via its relation to lexical access speed (Lyxell and Ronnberg, 
1992). 

OTHER MODALITIES OF COMMUNICATION 

A more general picture emerged as evidence accumulated that 
many of the predictor variables also related to other forms of 
communication. Successful visual-tactile speech communication 
and cued speech (i.e., a phonemic-based system which uses hand 
shapes to supplement speechreading) are predicted by phono- 
logical skills (e.g., Leybaert and Charlier, 1996; Bernstein et al., 
1998; Leybaert, 1998). The precision of a phonological represen- 
tation assessed by text-based rhyme tests has been shown to be 
an important predictor of the rate of visual-tactile (Ronnberg 
et al., 1998b; Andersson et al., 2001a,b), and visual speech track- 
ing (Andersson et al., 2001a,b). In the same vein, audio-visual 
speech understanding in cochlear implant (CI) users is pre- 
dicted by both phonological ability and individual differences in 
WMC measured using a reading span test (Lyxell et al., 1996, 
1998). 

IMPORTANCE OFWM 

Thus, about a decade ago, the data were pointing to an impor- 
tant role for WMC in predicting, directly or indirectly, the 
individual differences in speech understanding in one or more 
modalities. Testing participants who were hard-of-hearing or deaf 
provided clues as to how to re-conceptualize theories concern- 
ing speech understanding in individuals with normal hearing to 
take into account how their performance varies across a con- 
tinuum from ideal to adverse perceptual conditions. However, 



more direct tests of the hypothesis concerning the importance of 
WMC for speech understanding in atypical cases and conditions 
were needed. 

SPECIFYING THE ROLE OF WMC IN EASE OF LANGUAGE 

UNDERSTANDING 

DEFINING WM 

WM is a limited capacity system for temporarily storing and pro- 
cessing the information required to carry out complex cognitive 
tasks such as comprehension, learning, and reasoning. An indi- 
vidual's WMC, or span, is measured in terms of their ability to 
simultaneously store and process information. Importantly, com- 
plex WMC is the crucial ability when it comes to understanding 
language, viz being able to store and process information rela- 
tively simultaneously. Simple span tests, such as digit span, mainly 
tap storage functions in short-term memory, and tend not to be 
such good predictors of language comprehension, reading ability 
and speechreading ability [for an early review see Daneman and 
Merikle (1996); but see Unsworth and Engle (2007)]. We have 
usually assessed WMC using a reading span test (Daneman and 
Carpenter, 1980; Ronnberg et al., 1989; Just and Carpenter, 1992). 
In the reading span test procedure, the participant reads a sen- 
tence as quickly as possible and then performs a task to ensure 
that the sentence has been fully processed. After a small set of 
sentences has been presented, read and understood, the partici- 
pant is asked to recall either the first or last word of each of the 
sentences in the set in the order in which they were presented. 
Set size gradually increases and the WM span is determined to 
be the largest set size for which the individual can correctly recall 
a minimum specified proportion of the words. We have found 
in our research that the total number of words correctly recalled 
in any order, is a more sensitive predictor variable than set size 
(Ronnberg et al, 1989; Lunner, 2003; Ronnberg, 2003). The basic 
assumption is that, as the processing demands of the reading span 
task increase, there will be a corresponding decrease in how much 
can be stored in the limited capacity WM system. Total reading 
span score is used to gauge this trade-off between WM processing 
and storage. 

There is a strong cross-modal relationship between reading 
span scores (visual-verbal) and spoken communication skills 
(auditory- verbal), implying that it is supported by a modality- 
general ability (cf. Daneman and Carpenter, 1980; Just and 
Carpenter, 1992; Kane et al., 2004). This may explain why WMC 
has a predictive power that applies to several communicative 
forms (e.g., Ibertsson et al., 2009). Moreover, the reading span 
test (and other similar complex span tests) seems to tap into 
semantic processes such as inhibition of irrelevant information (in 
particular inhibition of context-irrelevant word-meaning; Gunter 
et al., 2003), the ability to selectively attend to one channel of 
information (Conway et al., 2001), the ability to divide atten- 
tion between channels (Colflesh and Conway, 2007), and the 
ability to store and integrate signal- relevant prior semantic cues 
(Zekveld et al, 2011a, 2012). The similiarity of results across 
the wide range of conditions applied in these studies supports 
the role of WMC in on-line language processing. Our research 
assessed the role of WMC during language understanding in var- 
ious conditions such as when speech is processed visually by 
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speechreaders, when auditory speech is heard in noise, by lis- 
teners with hearing impairment, when hearing aids are used, 
and when sign rather than speech is the signal used to convey 
language. 

EXTREME SPEECHREADING SKILL 

Case studies of extremely skilled speechreaders (Ronnberg, 1993; 
Lyxell, 1994; Ronnberg et al., 1999) demonstrated that bottom- 
up processing skills (e.g., lexical access speed and phonology) 
are only important up to a certain threshold, or level of lan- 
guage understanding. The threshold is assumed to be due to the 
efficiency of phonologically mediated lexical access, constrained 
by neural speed at different levels in the perceptual-cognitive 
system (Pichora-Fuller, 2003). We showed that to surpass this 
threshold, and to become speechreading experts, the individual 
has to be equipped with large complex WMC and related ver- 
bal inference-making and/or executive skills (see also Andersson 
and Lidestam, 2005). This seemed to be true irrespective of 
communicative habit — participant GS used tactile speechread- 
ing (Ronnberg, 1993), participant MM was bilingual in sign and 
speech (Ronnberg et al, 1999), and participant SJ used visual 
speechreading strategies only (Lyxell, 1994). The effects could not 
be explained in terms of age, degree of hearing loss, or even onset 
of the loss. 

NOISE, HEARING LOSS. AND HEARING AIDS 

The importance of predictions of individual differences in speech 
understanding based on reading span was crucially demonstrated 
when a strong association was found with spoken sentence recog- 
nition in noise by individuals with hearing loss irrespective of 
whether they were tested with or without hearing aids (Lunner, 
2003). During conversation, the individual who has impaired 
hearing must orchestrate the interplay between distorted percep- 
tual input, LTM, and contextual cues. We argue that the storage 
and processing abilities reflected in complex WMC tasks are 
essential for such compensatory interactions in people with hear- 
ing loss. WMC also seems to play an important role when people 
with normal hearing must understand language spoken in acous- 
tically adverse conditions (for discussions see Mattys et al., 2012; 
Pichora-Fuller et al, 1995; McKellin et al, 2007). 

SIGN LANGUAGE 

Insight into the role of WM in sign language communication also 
led to a series of studies at our lab investigating the neurocogni- 
tive mechanisms of WM for sign language (Ronnberg et al, 2004; 
Rudner et al, 2007, 2010, 2013; Rudner and Ronnberg, 2008a,b). 
These studies demonstrate similar neurocognitive mechanisms 
across language modalities with some modality-specific aspects. 
These language modality-specific differences include a greater 
involvement of superior parietal regions in WM for sign language 
and a de-emphasis of temporal processing mechanisms. 

Taken together, the speech understanding and sign language 
findings set the stage for formulating the Ease of Language 
Understanding model (ELU, see Ronnberg, 2003; Ronnberg 
et al, 2008) to extend existing more general models of WM 
in order to account for a wide range of communication 
conditions. 



THE ORIGINAL WM SYSTEM FOR EASE OF LANGUAGE 
UNDERSTANDING (ELU) 

The broader context of the ELU model is that of cognitive hear- 
ing science. Cognitive Hearing Science is the new field that 
has emerged in response to general acknowledgement of the 
critical role of cognition in communication (Arlinger et al, 
2009). Characteristic of cognitive hearing science models is that 
they emphasize the subtle balancing act, or interplay between 
bottom-up and top-down aspects of language processing (e.g., 
Schneider et al, 2002; Scott and Johnsrude, 2003; Tun et al, 
2009; Mattys et al, 2012). The ELU model describes how and 
when WM is engaged to support listening in adverse conditions, 
and how it interacts with LTM. In the original version we did 
not distinguish between episodic and semantic LTM but sub- 
sequent research and theoretical development have proven that 
this is an important distinction (see under EXTENDING THE 
ELU APPROACH:.). Episodic memory is memory of person- 
ally experienced events (tagged by time, place, space, emotion 
and context, see Tulving, 1983). Semantic memory refers to gen- 
eral knowledge, without personal reference (e.g., vocabulary and 
phonology). 

In the original model (see Figure 1; Ronnberg, 2003; Ronnberg 
et al., 2008; Stenfelt and Ronnberg, 2009), we assumed that 
multimodal speech information is Rapidly, Automatically, and 
Multimodally Bound into a PHOnological representation in an 
episodic buffer (cf. Baddeley, 2000, 2012) called RAMBPHO. 
RAMBPHO is assumed to operate with syllables that feed forward 
in rapid succession (cf. Poeppel et al., 2008; Bendixen et al, 2009). 
If the RAMBPHO-delivered sub-lexical information matches a 
corresponding syllabic phonological representation in semantic 
LTM, then lexical access will be successful and there is no need 
for top-down processing. And, if RAMBPHO continues to pro- 
vide matching syllabic information, lexical retrieval will continue 
to occur implicitly and at a rapid rate. The time-window for 
the assembly of the RAMBPHO information and for success- 
ful lexical retrieval is assumed to start when activation begins at 
a cortical level [superior temporal gyrus (STG)/posterior supe- 
rior temporal sulcus; (Poeppel et al., 2008)], where the neural 
binding of syllabic auditory and visual speech seems to occur 
(around 150 ms after speech onset, Campbell, 2008), and then 
it generally takes another 100-250 ms before lexical access pre- 
sumably occurs supported by neural mechanisms in the left 
middle temporal gyrus(MTG)/inferior temporal gyrus (Poeppel 
et al, 2008; see also Stenfelt and Ronnberg, 2009). If, however, 
the RAMBPHO information cannot be immediately related to 
phonological representations in semantic LTM or is not precise 
enough to match them unambiguously, lexical access is delayed, 
temporarily disrupting the feed-forward cycle of information 
flow. Explicit and deliberate WM processes are assumed to be 
invoked to compensate for this mismatch between RAMBPHO 
output and LTM representation. These explicit processes typi- 
cally operate on another time scale, measured in seconds rather 
than milliseconds (Ronnberg et al., 2008). Examples of such pro- 
cesses include inference-making, semantic integration, switching 
of attention, storing of information, and inhibiting irrelevant 
information. While the source of the mismatch is at the lexi- 
cal level, later explicit compensation may involve other linguistic 
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FIGURE 1 | The working memory model for Ease of Language Understanding (ELU, adapted from Ronnberg et al., 2008). 



levels. WMC is assumed to be required for most explicit process- 
ing aspects/subskills. 

Generally then, depending on the conditions under which the 
incoming speech signal unfolds (ambient noise, hearing impair- 
ment, signal processing in the hearing aid, etc.), as well as the pre- 
cision and quality of the semantic LTM representation, the relative 
contributions of explicit and implicit processes will continuously 
fluctuate during a dialogue. 

INITIAL TESTS OF THE MODEL 

The experimental studies performed to test the model have prin- 
cipally used two types of manipulation, one based on hearing aid 
signal processing and one based on presentation of text cues in 
order to induce a mismatch between RAMBPHO and semantic 
LTM. Initial testing was done primarily with spoken language pre- 
sented in auditory noise, but the model is also likely to be appli- 
cable to signed languages presented in visual noise (cf. Speranza 
et al, 2000). Brain imaging studies (e.g., Soderfeldt et al, 1994; 
Ronnberg et al., 2004; Rudner et al, 2007, 2013; Cardin et al, 
2013) suggest that similar but not identical neural networks are 
active for processing sign language and speech, and that the close 
relation between semantics and phonology in sign language may 
influence the mismatch mechanism (Rudner et al, 2013). 

USING AUDITORY SIGNAL PROCESSING MANIPULATIONS TO INDUCE 
PHONOLOGICAL MISMATCH 

Wide Dynamic Range Compression (WDRC) is one of the 
technologies used in modern digital hearing aids to increase 
speech intelligibility by applying non-linear amplification of the 
incoming signal such that soft sounds become audible without 
loud sounds becoming uncomfortable. However, this non-linear 
signal processing can also have side-effects that distort the 
phonological properties of speech, especially when compres- 
sion release is fast. We used this phenomenon to investigate 
the main prediction of WM-dependence in the ELU model in 
experienced hearing-aid users. Hearing aids were experimentally 
manipulated such that participants received WDRC for the first 



time. According to the model, given that a syllabic segment 
in the speech stream is processed with a new algorithm, the 
sound may seem different compared to the one delivered by the 
habitual algorithm, thus causing a relative RAMBPHO-induced 
mismatch with the phonological-lexical representation in LTM. 
Results showed that individual differences in WMC accounted 
for most of the variance in the threshold for 50% correct word 
recognition on speech-in-noise tests, irrespective of whether sta- 
tionary or modulated noise backgrounds were applied (Foo 
et al, 2007; cf. Desjardins and Doherty, 2013). This means 
that as long as we disrupt the habitual processing mode, WMC 
is invoked. 

A follow-up intervention study was conducted to investigate 
how the relationship between WMC and mismatch might change 
as the individual acclimatized to a new hearing aid algorithm. 
Again, participants who were habitual hearing aid users were 
switched to a new fast or slow signal processing algorithm in the 
hearing aid. After nine weeks of experience with one kind of hear- 
ing aid compression, participants were tested either with the same 
kind of compression ("matching" conditions), or with the other 
kind of compression of which they had no experience ("mis- 
matching" conditions). As predicted, in one study conducted in 
Swedish (Rudner et al., 2009a,b) and in another conducted in 
Danish (Rudner et al., 2008), thresholds for 50% correct word 
recognition on speech-in-noise tests for mismatching compres- 
sion conditions were correlated with WMC. WMC was not the 
main predictor of speech-in-noise thresholds for matching con- 
ditions. Independent studies support the notion that WMC is 
crucial to speech understanding in adverse conditions by hearing 
aid users (Gatehouse et al., 2003, 2006; Lunner, 2003; Akeroyd, 
2008; Rudner et al., 2011; Mattys et al, 2012). 

Using the visual letter monitoring task as an index of 
WMC, Lunner and Sundewall Thoren (2007) showed that WMC 
accounted for about 40% of the variance in the ability to perceive 
speech in modulated background noise with FAST compression. 
Pure-tone average hearing loss, on the other hand, accounted for 
less than 5% of the variance. Importantly, the pattern was reversed 
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when compression by the hearing aid was SLOW and tests were 
conducted in steady-state noise conditions: WMC explained only 
5% of the variance while pure-tone hearing loss explained 30%. 
Lunner and Sundewall Thoren (2007) suggested that FAST com- 
pression in modulated noise backgrounds better reflects the more 
rapid changes in the signal and noise characteristics of every- 
day listening conditions. Hence, using SLOW compression in 
steady-state noise conditions may underestimate everyday cog- 
nitive demands (cf. Festen and Plomp, 1990). The conclusions 
drawn in the studies using WDRC are consistent with recent 
findings in which other advanced hearing aid signal processing 
algorithms were used: Arehart et al. (2013) found that a high 
degree of frequency compression reduced intelligibility more for 
individuals with low WMC compared to individuals with high 
WMC, especially for older adults. 

The emerging picture seems to be that advanced signal pro- 
cessing algorithms designed to improve intelligibility and listen- 
ing comfort may also generate RAMBPHO-dependent mismatch 
due to distortions at the syllable level caused by unfamiliar ampli- 
tude or frequency compression. Thus, there is a benefit and a cost 
from such signal processing. Mismatches, or costs, are overcome 
more successfully by individuals with high WMC. 



USING TEXTUAL MANIPULATIONS TO CREATE PHONOLOGICAL 
MISMATCH 

Severe hearing loss can lead to phonological deterioration in 
semantic LTM (Andersson, 2002; Lazard et al., 2010; Ronnberg 
et al, 2011b). Classon et al. (2013a) undertook a study that tested 
the hypothesis that high WMC can compensate for poor phono- 
logical skills in individuals with hearing impairment. To avoid 
audibility problems, phonological mismatch was manipulated 
using text rather than speech. Classon et al. (2013a) showed that 
hearing impairment negatively affected performance on a text- 
based task in which participants decide if two words rhyme or not 
in RAMBPHO-dependent, mismatching conditions. Mismatch 
was created in conditions where the two test words rhymed 
but were orthographically dissimilar, or alternatively, did not 
rhyme but were orthographically similar (Lyxell et al, 1993, 1998; 
Andersson and Lyxell, 1998; Andersson, 2002). In the latter case, 
orthographic similarity may induce an incorrect "yes" response 
when words do not rhyme, if the phonological precision of rep- 
resentations in semantic LTM is compromised. The prediction 
based on the ELU model is that participants who have a high 
WMC will be able to compensate for poor phonological repre- 
sentations because they can keep representations in mind and 
double-check back and forth to ensure that the words really do 
not rhyme before they decide. The data confirmed this predic- 
tion. Hearing impaired participants with high WMC performed 
on a par with normal hearing participants, whereas hearing 
impaired participants with low WMC displayed higher error 
rates than the normal hearing subgroups with low WMC. Note 
that hearing impairment did not confound the results since the 
level of WMC was matched across groups with normal hearing 
and hearing impaired participants, and there was no difference 
in the degree of hearing impairment between the high vs. low 
WMC subgroups. 



SEMANTIC STRATEGY IN RHYME TASKS 

The effects of hearing impairment on the mismatching condi- 
tions in the yes/no rhyme task may be attributed to imprecise 
phonological representations in semantic LTM. This may lead 
automatically to a non-phonological orthographic bias, and per- 
haps even a semantic bias, when written words are presented in 
a rhyme task, especially for individuals with hearing impairment 
who have a low WMC. The plausibility of such an explanation 
was reinforced by the finding that participants with low WMC 
outperformed participants with high WMC on subsequent inci- 
dental recognition of items that had been correctly identified in 
the initial rhyme testing phase. Since semantic processing has 
been shown to promote episodic LTM (e.g., Craik and Tulving, 
1975), a semantic interpretation of this seemingly paradoxical 
result may fall into place. 

Connected to this semantic interpretation of the rhyming data, 
is the fact that the test of WMC that we have been discussing 
so far, the reading span test, also measures important semantic 
interpretation processes. Although the semantic absurdity judg- 
ments typically demanded in this task (Ronnberg et al, 1989) 
were initially introduced to ascertain that the participants actually 
processed the whole sentence rather than strategically focusing 
only on the first or final words, semantic processing may in itself 
be an important component of the test. Indeed, sentence comple- 
tion ability (tapping semantic integration and grammar) under 
time pressure is significantly correlated to performance in the 
reading span test (Lyxell and Ronnberg, 1989). Although the read- 
ing span test taps into several storage and processing components 
summarized by one measured variable, a semantic perspective on 
reading span may cast new light on old data in that rapid sense- 
making and semantic judgment is demanded in the reading span 
test as well as in the sentence-completion task. 

NEURAL SIGNATURES OF TEXT-SPEECH SEMANTIC MISMATCH 

WMC, again measured with the reading span task, has in recent 
studies been shown to modulate the ability to use semantically 
related cues and to suppress unrelated, "mismatching" cues to 
help understand speech in noise (Zekveld et al, 2011a, 2012). 
Interestingly, both the WMC of the participants and the lexicality 
of text cues modulated neural activation in the left inferior frontal 
gyrus (LIFG) and the STG during speech perception. Presumably, 
these areas are related to compensatory processes in semantic cue 
utilization. Independent data also suggest that LIFG is involved in 
semantic and syntactic processing networks (Rodd et al., 2010). 
Cortical areas beyond the temporal lobe are engaged in the pro- 
cessing of intelligible but degraded speech (Davis and Johnsrude, 
2007). It is quite plausible that there is a functional connectivity 
between LIFG and STG, and that LIFG modulates STG via top- 
down connections when semantic processing is involved (Obleser 
and Kotz, 2011). In fact, the general picture is that there are ven- 
tral and dorsal pathways that connect pre-frontal and temporal 
language-relevant regions which support semantic and syntactic 
processes (Friederici and Gierhan, 2013). 

INTERIM SUMMARY 

Thus far, we can infer the architecture of a WM system (i.e., 
the ELU model) that is invoked when there is some kind of 
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signal processing that changes the phonological structure of the 
speech signal, or when there is a combination of signal process- 
ing and fluctuating background noise that puts large demands on 
phonological processing. In addition, there is also new evidence 
to suggest that a semantic mismatch requires WM resources to 
help focus on the target-speech signal while inhibiting distracting 
semantic cue information as will be discussed further below. 

EXTENDING THE ELU APPROACH: WMC RELATED TO 
ATTENTION, MEMORY SYSTEMS, AND EFFORT 

In the following sections, evidence is reviewed indicating that 
WMC plays a part in (a) "early" attention processes, (b) short- 
term retention of spoken information when the signal is pro- 
cessed by hearing aids, (c) inhibition and episodic LTM of masked 
speech, (d) the effects of hearing impairment on episodic and 
semantic LTM, and (e) listening effort. These data — behavioral 
and physiological — have shaped a new version of the ELU model, 
which will be presented subsequently. 

WMC INFLUENCES EARLY ATTENTIONAL PROCESSES 

This section suggests that high WMC is associated with neural 
interactions that facilitate attention and which are important for 
further speech signal processing (Peelle and Davis, 2012). This 
kind of cognitive tuning of the brain does not seem to involve 
any explicit processing component, although it is dependent 
on WMC. 

WMC is related to the ability to inhibit processing of irrelevant 
information and overrule undesired but pre-potent responses 
(e.g., Kane et al, 2001; Engle, 2002). More precisely, high- WMC 
individuals appear to have a superior ability to modulate atten- 
tion span (i.e., how much information that has access to cognitive 
processing). Where in the processing chain filtering out of irrele- 
vant information takes place is still a subject of debate. Relations 
between WMC and early cortical auditory processing (as reflected 
in the amplitude of the Nl component of event-related poten- 
tial measures) have been demonstrated with greater amplitudes 
for attended sound and lesser amplitudes for ignored sound in 
high-WMC individuals (Tsuchida et al, 2012). However, a recent 
experiment in our lab (Sorqvist et al., 2012a) suggests that WMC 
is involved in filtering processes at even earlier (sub-cortical) 
stages. Normally-hearing participants visual-verbal performed a 
visual M-back (1-, 2-, 3-back) task (Braver et al, 1997) while 
being presented with to-be-ignored background sound. In a con- 
trol condition, the participants just heard the sound and did not 
perform any task. In the n-back task, WM load increased with 
increasing n and the control condition represented least load. The 
magnitude of the auditory brain stem response (ABR, wave V, on 
average 7 ms post-stimulus onset) was negatively associated with 
WM load. Moreover, higher WMC scores were related to a greater 
difference of the ABR between conditions. Thus, both the exper- 
imental load manipulation and correlational evidence converge 
on the same conclusion: early attentional processes interact with 
WM. Our interpretation is that cognitive load reduces resources 
at the peripheral level, and the relation with WMC suggests a 
relationship between central and peripheral capacity. 

One mechanism underpinning this relation might be the 
alpha rhythm. Alpha rhythms reflects the cognitive system's 



pre-stimulus preparation for incoming stimuli, enabling efficient 
processing (Babiloni et al., 2006), and have been associated with 
both WM load and processing of acoustically degraded stimuli 
(Obleser et al, 2011). Moreover, in a recent focused review of 
brain oscillations and WM, it was suggested that the alpha rhythm 
serves as an attentional gate-keeper to optimize the signal-to- 
noise ratio for WM-based processing, and that the number of 
gamma cycles that fit within one theta cycle may index WMC 
(Freunberger et al, 2011). 

However, single indices may only tell part of the story of how 
brain oscillations relate to WM. In a recent review, it has been 
argued that the correlations between oscillatory phases in dif- 
ferent brain regions, so called phase synchronization, affect the 
relative timing of action potentials. This is important for a mem- 
ory system such as WM, which in turn depends on the interaction 
between different storage and executive processing components 
(and their corresponding phases), for example, phase correlations 
between pre-frontal and temporal regions (Fell and Axmacher, 

2011) . 

Thus, a high WMC may facilitate neural fine-tuning at 
an early level of auditory processing (cf. Pichora-Fuller, 2003; 
Sorqvist et al., 2012a) but may also reflect a highly synchro- 
nized brain network (Fell and Axmacher, 2011). The conclu- 
sion about some kind of fine-tuning is further reinforced by 
the finding that WM processes are interconnected with the 
effects of practice on auditory skills (Kraus and Chandrasakaren, 
2010) and their corresponding neural signatures (Kraus et al., 

2012) . 

All in all then, data from independent labs suggest that WMC 
is related to several brain oscillation indices, and that WMC is 
related to early attention processes. This WMC-based top-down 
influence on speech-relevant attention processes may be part of 
the explanation as to why attending to a speaker in a multi- 
talker situation gives rise to dedicated neural representations 
(Mesgarani and Chang, 2012). 

WMC INTERACTS WITH SIGNAL PROCESSING AND SHORT-TERM 
RETENTION 

This section presents data showing for the first time that hearing 
aid signal processing can improve short-term memory in hearing- 
impaired individuals, and that this effect is modulated by WMC 
(Ng et al., 2013a,b). This may prove to have important clinical 
consequences (Piquado et al, 2012). 

Even when audibility is controlled (e.g., by amplifying speech 
with hearing aids), individuals with hearing impairment still 
perform worse than young normal-hearing subjects, with cog- 
nitive factors accounting for residual variance in performance 
(e.g., Humes, 2007). Attentional resources may contribute to 
speech understanding, especially in effortful or divided attention 
tasks (Tun et al., 2009; Ronnberg et al, 2011a,b). For example, 
Tun et al. (2009) have shown poorer delayed recall for audi- 
ble auditory stimuli in participants with impaired compared to 
normal hearing when encoding took place under divided atten- 
tion conditions. Ronnberg et al. (2011a,b) also demonstrated that 
short-term memory performance under divided attention encod- 
ing conditions correlated with degree of hearing impairment (cf. 
Humes et al, 2006). 
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Hearing aid signal processing schemes may reduce attention 
costs while listening to speech in noise and thus improve speech 
understanding. It has been demonstrated that noise reduction 
signal processing reduces listening effort for people with nor- 
mal hearing (Sarampalis et al., 2009). In a recent study (Ng 
et al, 2013a,b), we examined how hearing aid signal processing 
influences word recall in people with hearing impairment. The 
scheme under investigation was binary time-frequency masking 
noise reduction (Wang et al., 2009). Each participant listened to 
sets of eight sentences from the Swedish Hearing-In-Noise-Test 
(HINT) materials (Hallgren et al, 2006) in 4-talker babble or 
stationary noise, with and without noise reduction. To control 
audibility, SNRs were individualized such that performance lev- 
els were around 95% for word recognition in stationary noise 
with individual linear amplification and individually prescribed 
frequency response. Typical SNRs for 95% correct were around 
+5 dB. Each participant recalled as many sentence-final words as 
possible after each set of sentences had been presented. We found 
that participants performed worse in noise than in quiet and that 
this effect was partially restored by noise reduction. In particu- 
lar, individuals with high WMC recalled significantly more of the 
items from the end of the lists (recency position) presented in 
noise when noise reduction was used. 

Thus, WMC interacts with signal processing in hearing aids 
and facilitates short-term memory. There is obviously room for 
improvement even when the audibility of the signal is good, a 
fact that offers a new perspective on how to conceptualize benefits 
from different kinds of signal processing in hearing aids. 

WMC— ESPECIALLY THE INHIBITORY ASPECTS— DETERMINE 
EPISODIC LTM FOR PROSE MASKED BY SPEECH 

This section is about how WMC relates to inhibition of an inter- 
fering talker during listening to sentences and to later long-term 
episodic recall. 

We have recently shown that WMC seems to be related to 
long-term retention of information that is conveyed by masked 
speech (Sorqvist et al., 2012b). Young, normally-hearing students 
listened to invented stories (each about 7.5 min long) about fake 
populations and afterwards answered questions about their con- 
tent (e.g., what did the lobiks wear in the kingdom of death?). The 
stories were spoken in a male voice and masked by another male 
voice (normal or spectrally-rotated; Scott et al., 2009). 

Two types of complex WMC tests were administered sep- 
arately: the reading span and the size-comparison (SIC) span 
test (Sorqvist et al, 2010). The SIC span is a WMC test that 
targets the ability to resist semantic confusion. It involves com- 
paring the size of objects while simultaneously maintaining and 
recalling words taken from the same semantic category as the to- 
be-compared words. The distinguishing feature of the test is that 
the semantic interference between the comparison words and the 
to-be-recalled words must be resolved by inhibiting the potential 
semantic intrusions from the comparison words. 

Ability to answer content questions was superior when the 
story was masked by a rotated as compared with a non-rotated 
speech signal. More importantly, SIC span was a better predic- 
tor variable than reading span of the magnitude of this difference 
(Sorqvist et al., 2012b). We argue that the inhibition ability 



tapped by SIC span is involved during resolution of the confusion 
between competing and target speech and that better resolution 
enhances episodic encoding and retrieval. This will, at least in 
part, determine an individual's ability to remember the important 
parts of a conversation. 

Speech-in-speech processing studies have typically addressed 
speech perception as such (e.g., Bronkhorst, 2000). Our contri- 
bution is that we associate WMC — and the inhibition component 
in particular — with the encoding carried out during speech- 
in-speech comprehension, and how this type of WMC encod- 
ing relates to episodic LTM (cf. Hannon and Daneman, 2001; 
Schneider et al, 2010). There is some evidence of a relation 
between episodic LTM and cognitive spare capacity (Rudner et al, 
2011;Mishra etal, 2013). 

DEGREE OF HEARING IMPAIRMENT IN HEARING AID USERS IS 
ASSOCIATED WITH EPISODIC LTM 

This section summarizes a recent cross-sectional study on a sam- 
ple of hearing aid users and how their hearing thresholds are 
associated with the efficiency of different memory systems. 

Despite the possibility of using hearing aids, hearing problems 
continue to occur in everyday listening conditions. Many people 
who own hearing aids do not use them on a regular basis. For 
those who do wear them regularly, signal processing algorithms 
in hearing aids cannot generally provide an optimal listening sit- 
uation in noisy and challenging conditions (Lunner et al., 2009). 
By including hearing aid users (n = 160) from the longitudinal 
Betula study of cognitive aging (Nilsson et al., 1997), we made 
a conservative test of the hypothesis that hearing impairment is 
negatively related to episodic LTM deficits. The basis of the pre- 
diction from the ELU model (Ronnberg et al., 2011a,b) is that 
mismatches will remain despite the use of a hearing aid, and 
hence fewer items will be encoded and retrieved from episodic 
LTM. Therefore, we assume a disuse effect on episodic LTM, 
leading to a less efficient episodic memory system. However, 
short-term memory (STM, here operationalized by Tulving and 
Colotla, 1970; the Tulving and Colotla lag measure) and WM (not 
explicitly measured in this study) should be increasingly active 
in mismatching conditions because both systems would be con- 
stantly occupied during retrospective disambiguation of what had 
been said in a conversation. Thus, both STM and WM would be 
relatively less vulnerable to disuse. It is also predicted that semantic 
LTM should be highly correlated with episodic LTM because the 
status of phonological representations in semantic LTM should 
be tightly related to the success of encoding into episodic LTM. 
These predictions have recently been confirmed by structural 
equation modeling. Episodic LTM decline is related to long-term 
hearing impairment, despite the use of existing hearing aid tech- 
nology (Ronnberg et al., 201 la,b). One note of caution though is 
that exact measures of every-day hearing aid use were not avail- 
able. Hence, any potential dose-response relationship among the 
hearing aid wearers could not be assessed. 

Thus, hearing loss was independently related to episodic LTM 
(verbal recall tasks) and semantic LTM (initial letter fluency and 
vocabulary) but unrelated to STM, even when age was accounted 
for. Visual acuity alone, or in combination with auditory acuity, 
did not contribute to any acceptable structural equation model; it 
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only made the prediction of episodic LTM decline worse by stan- 
dard goodness-of-fit criteria (see also Lindenberger and Ghisletta, 
2009). And finally, even when the episodic LTM tasks were of non- 
auditory nature (i.e., motor encoding of lists of imperatives and 
subsequent free recall of these actions, Nilsson et al, 1997) the 
association with hearing loss persisted (Ronnberg et al, 201 la,b). 

Although the participants wore their hearing aids whilst com- 
pleting the auditory episodic memory tasks, this negative result 
may be accounted for in terms of perceptual stress, or infor- 
mation degradation (cf. Pichora-Fuller et al, 1995). It has been 
argued and empirically demonstrated that once perceptual stress 
is equated for example among different age groups, differences 
in performance on WM, associative memory and comprehension 
tasks (e.g., Schneider et al., 2002) tend to vanish. Nevertheless, 
the decreased performance in the non-auditory tasks reported 
in Ronnberg et al. (2011a,b) cannot be explained on the basis 
of information degradation and it is possible that there are 
both information degradation and long-term deprivation effects. 
Central mechanisms involving attentional resources could also 
be affected by hearing impairment, which in turn would predict 
problems with memory encoding (Tun et al., 2009; Majerus et al., 
2012; Peelle and Davis, 2012; cf. Sorqvist et al., 2012a) and pos- 
sibly WMC (see also Schneider et al., 2010). Before we can reach 
definite conclusions about the selective effects of hearing impair- 
ment on memory systems, a broader spectrum of tasks assessing 
different memory systems must be employed. 

From a more general and clinical perspective, we suggest that 
future longitudinal studies should evaluate the effects of the use of 
the hearing aids on cognition and memory systems, and in partic- 
ular, the effects of certain kinds of signal processing on different 
tasks assumed to index different memory systems. 

WMC AND EFFORT 

In this section, we discuss recent work related to the ELU pre- 
diction about WMC and effort (cf. Hervais-Adelman et al, 2012; 
Amichetti et al, 2013). In particular, we focus on predictions 
based on recent data using pupillometry that contrast with the 
ELU prediction. 

Apart from taxing cognitive capacity, listening under adverse 
conditions is often associated with subjectively experienced effort, 
especially in individuals with hearing impairment (Pichora- 
Fuller, 2006). The ELU prediction about effort, or the inverse 
notion of "ease" (Ronnberg, 2003) is that in effort-demanding lis- 
tening situations, an individual with a high WMC will be better 
able to compensate for the distorted signal, without exhausting 
WMC and therefore experience less effort in comparison to an 
individual with small WMC (cf. the neural efficiency hypothesis; 
e.g., Pichora-Fuller, 2003; Heitz et al, 2008), given that the task 
does not hit ceiling/floor (Ronnberg, 2003). Intermediate diffi- 
culty levels provide the best opportunity for explicit processes to 
operate in a compensatory fashion. Recent work by our group has 
confirmed that higher WMC is associated with lower perceived 
and rated listening effort for intermediate levels of difficulty, or 
"ease" of processing (Rudner et al, 2012). We suggest that sub- 
jective effort ratings may be useful for understanding the relative 
contributions of explicit WM processes to speech understanding 
in challenging conditions (Rudner et al, 2012; Ng et al., 2013b). 



Some researchers have proposed that the pupillary response 
reflects cognitive processing load during the processing of sen- 
tences of different grammatical complexity (Piquado et al, 2010; 
Zekveld et al., 2010). This response is also sensitive to age, hearing 
loss, and the extra effort required to perceive speech in compet- 
ing talker conditions compared to noise maskers (Zekveld et al., 
2011b; Koelewijn et al., 2012a). Koelewijn et al. (2012b) observed 
that people with high SIC spans demonstrated larger pupil size, 
and that higher SIC span performance, in turn, was related 
to lower signal-to-noise ratios needed to perform at a certain 
threshold level in the competing talker condition. This pattern 
of findings may suggest that cognitive load is actually increased 
by high WMC, which can be viewed as a paradoxical result, but 
has support in the literature (Van der Meer et al., 2010; Zekveld 
et al., 2011b). Another interpretation of these data is that indi- 
viduals with a high capacity solve difficult stimulus conditions by 
consuming more cognitive brain resources (more extensively or 
more intensively), thus exercising greater task engagement, and 
this is what is reflected in the pupil size variations (Koelewijn 
et al., 2012b; see Grady, 2012). 

Pupil size seems to reliably capture cognitive load and asso- 
ciated effort under certain semantic or informational masking 
conditions. The exact mechanisms behind the empirical find- 
ings so far remain to be elucidated. But clinically, irrespective of 
explanatory mechanism, pupil size may become a complementary 
measure to subjective ratings of effort. 

GENERAL DISCUSSION AND A NEW ELU-M0DEL 

Phonological and semantic mismatches increase the dependence 
on WMC in speech-in-noise tasks. However, as we have seen 
in the current review of recent ELU-related WMC studies, the 
role of WMC is extended to include early attention mechanisms, 
interactions with memory systems under different multi-talker 
conditions, both for short-term and LTM, and a relationship to 
effort via subjective and objective measures. 

Below we present the new empirical extensions that emerge 
from our recent data inspired by the old ELU model (Ronnberg 
et al., 2008). Then, we describe the new ELU model, based on 
these new empirical patterns, emphasizing in general and in detail 
the new features that differ from the old model. A section on 
predictions will close the presentation of the new model. In the 
following section, the new ELU model is compared to other 
relevant WM and speech perception models. The paper ends 
by addressing some important clinical consequences that follow 
from the model. 

NEW EMPIRICAL EXTENSIONS 

First, the data we have presented and discussed suggest that 
several kinds of signal processing in hearing aids (i.e., fast ampli- 
tude compression, frequency compression, and binary masking), 
designed to facilitate speech perception, are handled best by 
individuals with high WMC. This is the first extension from 
the original studies that informed the development of the ELU 
model (Ronnberg et al., 2008). At that time, we did not know 
whether WMC was important for just one kind of distortion 
induced by signal processing (i.e., fast amplitude compression) 
or not. Importantly, when some kind of distortion of the signal 
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is introduced, the feed-forward mechanism (cf. Bendixen et al, 
2009) of RAMBPHO that predicts yet-to-be-experienced (syl- 
labic) elements in the unfolding sound sequence (cf. Poeppel 
et al, 2008; Bendixen et al., 2009) seems to be temporarily inter- 
rupted, allowing ambiguous information to enter an explicit 
processing loop before understanding can be achieved. 

A second extension is related to the pre-tuning or synchroniza- 
tion by WMC, directly or indirectly, prior to or early on during 
stimulus presentation. One type of prior influence mediated by 
WMC relates to "early" attention processes (Fell and Axmacher, 
2011; Freunberger et al., 2011; Sorqvist et al, 2012a), another is 
related to priming, or pop-out (e.g., Davis et al., 2005). Recent 
data suggest that the magnitude of the pop-out effect may be 
mediated by WMC (Signoret et al, 2012). A third kind of influ- 
ence exerted by WMC relates to memory encoding operations 
(Sorqvist and Ronnberg, 2012), and subsequent influences on 
understanding, including turn-taking in a dialogue (Ibertsson 
et al, 2009). This kind of continuous feedback was not part of the 
old ELU model (Ronnberg et al, 2008). This means that the new 
model also acknowledges a post-dictive, explicit feedback loop, 
feeding into predictive RAMBPHO processing. This mechanism 
is akin to the hypothesis testing, analysis-by-synthesis aspect of 
the Poeppel et al. (2008) framework (see below under Relation to 
other models). 

A third extension has to do with the role of WMC in process- 
ing text cues that generate explicit semantic expectations of what 
will come in the unfolding speech stream. WMC is particularly 
important when expectations are violated by the content of the 
speech signal (Zekveld et al, 2011a). This may be because indi- 
viduals with high WMC have a superior ability to inhibit the 
cue-activated, mismatched representation in semantic memory 
(cf. Nostl et al, 2012; Sorqvist et al, 2012b). The discovery of 
a semantic influence on RAMBPHO processing means that the 
theoretical assumption of the model must be revised (see fur- 
ther below). Further, research suggests that older people more 
frequently rely on semantic context. For older people, incon- 
gruent semantic context seems to impair identification of words 
in noise, although confidence levels are higher than in younger 
adults (Rogers et al, 2012). Older people have a smaller WMC 
than younger individuals while frontal-lobe based executive func- 
tions may remain intact and this may account for the false hearing 
effects (Rogers et al., 2012). Also, over many decades of greater 
reliance on context in the face of gradual age-related declines 
in sensory processing, there may be changes in brain organiza- 
tion with an anterior-posterior shift in the brain areas engaged in 
complex tasks (Davis et al., 2008). 

A fourth extension is that high WMC individuals can deploy 
more resources to both semantic and phonological aspects of a task, 
depending on instruction. The versatility in types of processing 
(phonological and semantic) of high WMC people represents a 
feature that was lacking in the old model. For example, in the 
Sorqvist and Ronnberg (2012) study WMC contributed to inhibi- 
tion of a competing talker while focusing on the semantic content 
of the target talker. A consequence of this is enhanced, or deeper, 
understanding (Craik and Tulving, 1975). The by-product is more 
durable episodic memory traces (Classon et al, 2013a). In a recent 
ERP study Classon et al. (2013b) showed that hearing impaired, 



but not normal hearing individuals, demonstrate an amplified 
N2-like response in non-rhyming, orthographically mismatch- 
ing conditions. This ERP signature of hearing impairment is 
suggested to involve increased reliance on explicit compensatory 
mechanisms such as articulatory recoding and grapheme-to- 
phoneme conversion and may prove to tap into some phonologi- 
cal WM function. 

A fifth important extension encompasses the negative relation- 
ships between hearing loss and episodic and semantic LTM. These 
occur despite the use of hearing aids. However, STM is relatively 
unaffected, presumably because the demand to resolve speech 
understanding under mismatching, adverse conditions keeps this 
memory system in a more active state. Therefore, the overall selec- 
tive effects on different memory systems are couched in terms 
of use/disuse (Ronnberg et al., 2011a,b). It should be noted that 
although the ELU prediction is in terms of relative effects of 
use/disuse, it does not exclude the possibility that either STM or 
WM maybe affected by hearing impairment (cf. Van Boxtel et al., 
2000; Cervera et al., 2009); it only predicts a relatively larger LTM 
impairment. 

A sixth general fact to note is the modality-generality of mem- 
ory systems in relation to language understanding. Reading span 
obviously taps modality-general verbal WMC as it predicts vari- 
ance in the speech-in-noise tasks (Akeroyd, 2008; cf. Daneman 
and Carpenter, 1980; Just and Carpenter, 1992). Generality is 
also a key feature of the modulation of auditory attention (ABR) 
by manipulating visual-verbal WM load (Sorqvist et al., 2012a). 
Finally, a striking finding in the (Ronnberg et al., 2011a,b) study 
is that the negative memory consequences that may be attributed 
to hearing loss also show an independence of encoding format, 
and is not uniquely related to auditory encoding: At the level 
of simple correlations, hearing loss showed the highest nega- 
tive correlation to free recall performance on tasks which not 
only involved auditory encoding but also encoding of motor and 
textual representations — and the effects were still manifest after 
statistically correcting for age. 

Seventhly, and finally, the effect of WMC on stimulus process- 
ing is pervasive in terms of the time window: from early brain stem 
responses to encoding into episodic LTM. Thus, the above gen- 
eralizations have set the stage for a more analytical and general 
formulation of the ELU model. 

THEORETICAL CONSEQUENCES 

The new extensions result in a better specified ELU model that 
presents WM as the arena for interpretating the meaning of an 
ongoing dialogue. An individual with high WMC is more capable 
of using different levels/kinds of information and implicit/explicit 
strategies for extracting meaning from a message. The storage 
and processing operations that are performed by a high-capacity 
system are modality-general and flexible during multi-tasking. 
Implicit and explicit processes are assumed to run in parallel and 
interactively, but under different time windows (cf. Poeppel et al., 
2008). 

The successful listener disambiguates the signals on-line over 
time, due to successive semantic and lexical retrieval attempts, 
combined with contextual and dialogical constraints to narrow 
down the set of lexical candidates cued in the speech stream. 
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Because of time constraints in dialogues, the listener may often 
settle for the gist without resolving all of the details of the signal- 
meaning mapping. It may even be the case that the context is so 
strongly predictive that very little if any information delivered by 
RAMBPHO is needed for successful recognition to occur (Moradi 
etal., 2013). 

We now further assume that the information delivered by 
RAMBPHO is relayed by a fast-forward, matching mechanism 
that is nested under a slow, explicit feedback loop (cf. Poeppel 
et al, 2008; Stenfelt and Ronnberg, 2009). The mismatch in itself 
is determined either by poor RAMBPHO information and/or 
poor phonological representations in semantic LTM. We concep- 
tualize the phonological representations in LTM in terms of mul- 
tiple attributes. A minimum number of attributes are required for 
access to a certain lexical item. Above a certain threshold there is 
a sufficient number of attributes to trigger the lexical representa- 
tion. Below threshold, we can expect a number of qualitatively 
different outcomes: (a) if the number of attributes is close to 
threshold, then some phonological neighbors may be retrieved 
(Luce and Pisoni, 1998); (b) if too few attributes match the 
intended target item, the matching process could be led astray by 
contextual constraints induced by "mismatching" semantic cues 
(Zekveld et al., 2011a); and (c), if no phonological attributes are 
present at the RAMBPHO level, it could still be the case that a 
sentence context is so predictive that an upcoming target word is 
very likely to be activated anyhow (Moradi et al., 2013). 

The matching process is ultimately determined by the fidelity 
of the input and phonological representation. Fidelity is affected 
by external noise but also by internal noise (e.g., by poor phono- 
logical representations due to long-term hearing impairment; 
Classon et al, 2013a,b). These phonological attributes are primar- 
ily constrained at a syllabic level of representation. RAMBPHO 
information is based on rapid phonological extraction from the 
signal by means of a mix of visual, sound-based and motoric pre- 
dictions (cf. Hickok, 2012). We still propose that the bottleneck of 
the system is the connection between RAMBPHO delivered infor- 
mation and the phonological-lexical representation. However, we 
now also assume that the phonological attributes are embedded 
within domains of semantically related attributes; i.e., relations 
between the two types of attributes are assumed to be stored 
and represented together (cf. Hickok, 2012). Thus, these synergis- 
tic representations allow lexical access both via RAMBPHO and 
semantic cueing (Zekveld et al., 2011a, 2012), and give ground 
for a conceptualization of a versatile, multi-code capacity usage 
of high WMC participants. Furthermore, in the new ELU model, 
the implicit as well as the explicit processing mechanisms rely on 
phonological and semantic interactions. Semantic LTM can be 
used either for explicit "repair" of a distorted signal, for inference- 
making, or for implicit and rapid semantic priming. Mismatch 
will determine how time is shared between the explicit and 
implicit operations: the fast (implicit) RAMBPHO mechanism is 
always running until it is temporarily interrupted. When inter- 
rupted, the default situation is that it re-starts the analysis of the 
speech signal with whatever information is available (phonolog- 
ical/semantic). At the same time, mismatch will tune the system 
to use the explicit slow loop to repair violated expectations, again 
via semantic and phonological cues. 



Under time pressure, and given that the listener is happy to 
settle for the gist of the message (see Pichora-Fuller et al, 1998), 
low-level RAMBPHO processing may be overruled by explicit 
functions. RAMBPHO is in principle "blind" to the overall mean- 
ing of a message, in the sense that its sole function is to "unlock" 
the lexicon. But it is conceivable that it can be modified in terms 
of attention to certain attributes depending on semantic knowl- 
edge about speaker identity and topic (Mesgarani and Chang, 
2012). The crucial aspect is therefore not the specific kind of 
signal processing that temporarily interrupts RAMBPHO, but 
the modality-general explicit capacity to use and combine the 
available perceptual evidence and quality of the LTM knowledge. 
This takes place via different WMC-dependent executive mech- 
anisms such as inhibition, focusing of attention, and retrieval of 
contextual and semantic information. The sooner the brain can 
construct an interpretation of the message, the easier language 
processing becomes, and the content of a dialogue is more rapidly 
committed to more permanent memory encodings. 

In short, the new ELU model is a WMC-based model of 
a meaning prediction system (cf. Samuelsson and Ronnberg, 
1993; Federmeier, 2007; Hickok, 2012). Specifically, the settings 
of the system are regulated either explicitly (by some seman- 
tic/contextual instruction or explicit feedback) or by the neural 
consequences of high WMC (in terms of, e.g., brain oscilla- 
tions). Attention manipulations — seen as one way of pre-tuning 
the system — have recently proven to have cortical consequences 
in speech in noise tasks (Mesgarani and Chang, 2012; Wild et al., 
2012). 

In Figure 2, we illustrate how explicit/implicit processes inter- 
act over time. Each explicit "loop" is activated by a mismatch. 
The number of times the listener passes through an explicit loop 
depends generally on for example turn taking, competing speech, 
attention manipulations, or to distortions from signal processing 
in the hearing aid. 

NEW ELU PREDICTIONS 

We outline some new predictions that follow from the revised and 
updated ELU model. 

( 1 ) Signal distortion will tax WMC during speech understand- 
ing. This applies to different kinds of signal compression 
algorithms used in hearing aids, and to other kinds of dis- 
tortion (cf. Foo et al., 2007; Arehart et al, 2013). Even at 
favorable SNRs, WMC modulates the effect of signal process- 
ing on short-term retention of spoken materials (Ng et al., 
2013a). Still, effects relating to intended distortion of the 
target signal per se and the unwanted artifacts of signal pro- 
cessing (e.g., "musical noise" during binary masking) need to 
be teased apart. 

(2) WMC is predicted to modulate early attention mechanisms 
(Sorqvist et al., 2012a; cf. Kraus and Chandrasakaren, 2010; 
Kraus et al., 2012) and semantic framing (priming). 

(3) Classon et al. (2013a,b) demonstrated that WMC can com- 
pensate for phonological deficits. It also modulates the 
use of semantic cues during speech-in-noise understand- 
ing (Zekveld et al., 2012). It addition, it predicts facilita- 
tion of encoding operations (and subsequent episodic LTM) 
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FIGURE 2 | The new Ease of Language Understanding (ELU) model. In 

ideal listening conditions, multimodal RAMBPHO input matches a sufficient 
number of phonological attributes (i.e., above threshold) in the mental 
lexicon and lexical access proceeds rapidly and automatically. RAMBPHO 
may be preset by expectations — modulated by WM — concerning the 
phonological characteristics of the communicative signal, e.g., the language 
or regional accent of the communicative partner or by semantic or 
contextual constraints. When there is a mismatch (as in suboptimal 
listening conditions), WM "kicks in" to support listening (Ronnberg et al., 



2010). The explicit, WMC-dependent, processing loop uses both 
phonological and semantic LTM information to attempt to fill in or infer 
missing information, which also feeds back to RAMBPHO. The output of 
the system is some level of understanding or gist, which in turn induces a 
semantic framing of the next explicit loop. Another output from the 
system is episodic LTM, where information encoded into LTM is 
dependent on the type of processing carried out in WM. Explicit and 
implicit processes run in parallel, the implicit being rapid, the explicit is a 
relatively slow feedback loop. 



in conditions of speech-in-speech maskers (Sorqvist and 
Ronnberg, 2012). In short, participants with high WMC are 
predicted to better adapt to different task demands than par- 
ticipants with low WMC, and hence are more versatile in 
their use of semantic and phonological coding and re-coding 
after mismatch. 

(4) STM, and by inference, WM, is predicted to be more robust 
than LTM systems in response to impairment-related decline 
(Ronnberg et al., 2011a,b). This prediction should be further 
tested with different tasks assessing different memory systems 
before definite conclusions can be made. 

(5) WMC is related to effort (Koelewijn et al, 2012a,b), espe- 
cially to intermediate levels of effort (Rudner et al., 2012). 
Further work is needed to uncover underlying mechanisms. 

(6) Predictions for sign language understanding should focus 
on visual noise manipulations and on semantic maskers to 
assess the role of WMC in understanding sign language 
under challenging conditions (Ronnberg et al, 2004; Rudner 
et al, 2007; Cardin et al, 2013). By testing whether WMC is 
also invoked in conditions with visual noise, the analogous 
mechanism to mismatch in the spoken modality could be 
evaluated. 

RELATION TO OTHER MODELS 

The new ELU model differs from models of speech percep- 
tion (e.g., the TRACE model, McClelland and Elman, 1986; the 
Cohort model, Marslen-Wilson, 1987; and the NAM model, Luce 
and Pisoni, 1998) and also from the original notion of mis- 
match negativity (Naatanen and Escera, 2000) in its assumption 
that explicit WMC is called for whenever there is mismatch 
between language input and LTM representations. In this way, the 



mismatch mechanism — and the demand on WMC — is related 
to communication. Nevertheless, the ELU model is similar to 
the earlier speech perception models in that all acknowledge the 
importance of an interaction with LTM representations and that 
lexical access proceeds via some kind of model-specific retrieval 
mode. The ELU model especially focuses on how the perceptual 
systems interact with different memory systems. The cognitive 
hearing science aspect and the historical context of the ELU model 
has recently been reviewed elsewhere (Pichora-Fuller and Singh, 
2006; Arlinger et al, 2009). 

RAMBPHO focuses on the integration of phonological infor- 
mation from different sources and thus shares similarities with 
the episodic buffer introduced by Baddeley (2000). However, 
unlike Baddeley's model, the ELU model is geared toward the 
communicative outcome, i.e., language understanding, rather 
than WMC as such (Rudner and Ronnberg, 2008a,b). The fact 
that the need for explicit resources such as WMC is restricted to 
mismatch situations also represents a unique processing economy 
aspect of the ELU model. 

The ELU model is inspired by the notions and models of WM 
for read text presented by Daneman and Carpenter (1980) and 
Just and Carpenter (1992) in that it emphasizes both storage and 
processing components of WM. This is why we originally adopted 
the reading span task as a potentially important predictor variable 
of speech-in-noise performance, without introducing audibility 
problems. The trade-off between storage and processing is par- 
ticularly relevant for the ELU model in that hearing impairment 
typically puts extra pressure on the processing and inference- 
making that is needed to comprehend a sentence. Less storage 
and less encoding into episodic LTM are expected for partici- 
pants with hearing impairment compared to participants without 
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hearing impairment unless they have a high WMC. Of particu- 
lar relevance is the fact that Just and Carpenter (1992) showed 
that WMC constrains sentence comprehension during reading 
such that individuals with high WMC are better than individuals 
with low WMC at coping with more complex syntactic structures 
(e.g., object-relative clauses), maintaining ambiguous representa- 
tions of sentences, and resolving anaphora. Dealing with semantic 
or syntactic complexity is presumably very important for partic- 
ipants who are "mismatching" frequently during conversation. 
Here, the attention, inference-making, inhibition and storage 
abilities of individuals with high WMC play a crucial role. 

The ELU has some interesting similarities with the speech 
perception model by Poeppel et al. (2008) in that both models 
assume parallel processes (streams) that operate within differ- 
ent time-windows (cf. Hickok and Poeppel, 2004, 2007). For 
ELU the first time window is when phonological representations 
are formed in RAMBPHO to match representations in semantic 
LTM; the second is the slower explicit loop function. RAMPBHO 
seems to be a concept very similar to the phonological primal 
sketch suggested by Poeppel et al., where syllables mediate lex- 
ical access. Also, audiovisual integration seems to occur around 
250 ms, where visual information typically leads and affects the 
integration (van Wassenhove et al., 2007). We have speculated 
about the earlier (than syllabic) spectral-segmental kinds of anal- 
yses discussed in the Poeppel et al. (2008) paper (see Stenfelt 
and Ronnberg, 2009), primarily in terms of how different types 
of hearing impairment might affect perception of segmental 
features. 

The explicit, slow processing loop, is postdictive in the sense 
that mismatch, error-induced, signals may invoke some kind of 
WM-based inference-making. This was the main function for 
re-construction and inference-making in the old ELU model 
(Ronnberg, 2003; Ronnberg et al., 2008). However, as we have 
emphasized with the new ELU model, its predictive potential is 
now clearly spelled out in terms of the re-settings explicit pro- 
cesses may invoke, phonologically and semantically, and also 
because of the fine-tuning or synchronization by WM itself 
(Hickok, 2012). In keeping with Poeppel et al. (2008), the neu- 
ral basis of syllabic processing is likely to involve STS, lexical 
access supposedly involves MTG, while our recent study (Zekveld 
et al, 2012) is a first indication of a frontal (LIFG) WMC- 
based compensation for the explicit effort involved in decoding 
words/sentences in noise. Poeppel et al. (2008) advocate an anal- 
ysis by synthesis framework whereby initial segments of a spoken 
signal are matched against a hypothesis, "an internal forward 
model." The internal model is then updated against new seg- 
ments of speech approximately at every 30 ms, feeding back to 
several levels of representation including the phonological pri- 
mal sketch. One way of conceptualizing the hypothesis-driven, 
analysis-by-synthesis framework by Poeppel et al. (2008) may in 
fact be understood in terms of WMC. A high WMC helps keep 
several hypotheses alive, allowing for top-down feed-back at sev- 
eral points in time and at segmental, syllabic, lexical and semantic 
levels of representation (cf. Figure 4 in Poeppel et al, 2008, cf. 
Poeppel and Monahan, 2011). The probability of entertaining 
or maintaining a hypothesis in WM may then in part be deter- 
mined by Bayesian logic, "The quantity p(H|E) represents the 
likelihood of the hypothesis, given the sensory analysis; p(E|H) is 



the likelihood of the synthesis of the sensory data given the analy- 
sis" (p. 1080, Poeppel et al, 2008), where H represents the forward 
hypothesis and E the perceptual evidence. With an ELU perspec- 
tive, this will also be modulated by the WMC to hold several 
hypotheses, at different levels in the cognitive system, in mind. 

In the general context of dual stream models, addressing 
the interaction between ventral and dorsal attention networks, 
Asplund et al. (2010) found that so called surprise blindness, i.e., 
where a profound deficit in the detection of a goal-relevant target 
(a letter) as a result of the presentation of an unexpected and task- 
irrelevant stimulus (a face), causes activity in the inferior frontal 
junction. This manipulation represents an interaction between 
stimulus -driven and goal-directed, hypothesis-driven attention 
and may be compared to the cueing manipulations by Zekveld 
et al. (2011a, 2012). Resolutions of ambiguity also involve inter- 
actions between stimulus-driven and knowledge-driven processes 
(Rodd et al, 2012), which demand the integrative functions of 
LIFG. These examples may in fact be related to the new predic- 
tive and postdictive (feedback) interactions postulated in the new 
ELU model. 

As discussed by Arnal and Giraud (2012), implicit tempo- 
ral predictions of spoken stimuli represent one mechanism that 
may be modulated by slow delta-theta oscillations, whereas in 
the case of top-down, hypothesis-driven transmission of content 
specific information, beta oscillations may index a complemen- 
tary mechanism in speech comprehension. Similar kinds of dual 
mechanisms have been proposed by Golumbic et al. (2012) 
when tracking selective attention to a target voice while ignoring 
another voice in a cocktail party situation. Low frequency activ- 
ity typically corresponds to the speech envelope at lower auditory 
cortex levels, whereas high gamma power activity is reflected in 
the entrainment to the attended target voice only at later stages of 
processing, which also were cortically spread out to, e.g., inferior 
frontal cortex and anterior temporal cortex. This general result 
connects nicely with the Zekveld et al. (2012) data of WMC based 
compensation localized in LIFG and MTG. 

Finally, Andersson and colleagues demonstrated in a recent 
study (Anderson et al., 2013a), using structural equation 
modeling, that auditory WM, in combination with central audi- 
tory functions such as brain stem responses (e.g., pitch encoding), 
contributes to understanding speech in noise. Peripheral auditory 
measures did not account for any variance but musical experi- 
ence reinforced the effect of auditory WM. This is in line with 
our research ascribing a central role to WM for speech under- 
standing under adverse conditions. Interestingly, Anderson et al. 
(2013b) have also been able to show that brain stem responses 
to complex sounds, rather than hearing thresholds, predict self- 
reported speech-in-noise performance. These data agree with the 
Sorqvist et al. (2012a,b) data on the relationship between brain 
stem responses and WMC. Since WM is by definition an explicit 
processing and storage system, the data also fit with the fact that 
self-report — which taps into explicit awareness of speech process- 
ing (cf. Ng et al., 2013b) — has the capacity to reflect brain stem 
responses. 

In sum: although the ELU model shares underlying notions 
with other speech perception and WM models, its uniqueness 
lies in the connection between mismatch and WMC (explic- 
itly and postdictively), and implicitly and predictively, between 
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WM and RAMBPHO, and the roles played by the interaction 
between WM and other memory systems such as episodic and 
semantic LTM. 

LIMITATIONS 

One limitation of the new ELU model concerns the more exact 
definition of when a mismatch condition is at hand. We have seen 
a picture of results that suggests that many kinds of signal pro- 
cessing actually demand a higher dependence of WMC, at least 
initially, before some learning or acclimatization has occurred 
(cf. the first prediction). This of course also holds true for the 
case when a person has acclimatized to a certain kind of signal 
processing, and then is tested with another, thus breaking, the 
habitual phonological coding schemes. However, a critique that 
can be launched is that we a priori may have problems deter- 
mining the exact parameters for the mismatch induction. The 
problem of circularity is apparent. More empirical investigations 
into, e.g., determining whether it is the kind of signal processing 
or the artifacts caused by signal processing that determine mis- 
match and WM dependence will help clarify this issue. Another 
problem relates to the (so far) relatively few studies involving the 
neural correlates of WMC and speech understanding in noise. 
Future studies will also have to address the neural consequences of 
high vs. low WMC and how it modulates predictions at different 
linguistic levels (syllabic, lexical, semantic, and syntactic). 

CLINICAL IMPLICATIONS 

Given that WMC is crucially important for on-line processing of 
speech under adverse conditions as well as the ability to maintain 
its content for shorter or longer periods, then hearing aid man- 
ufacturers, speech-language pathologists and hearing health care 
professionals must take that into account. First, clinically relevant 
WMC tests need to be developed; tests that tap into the processes 
that have proven to be modality-general and optimal for both on- 
line processing of speech as well as for episodic LTM. This means 
normative data needs to be collected to determine age-dependent 
and impairment-specific performance levels and provide a clin- 
ical instrument for assessing WMC. By using visual-verbal tests 
audibility problems are avoided, thus disentangling potential per- 
ceptual degradation effects from WM performance. However, it 
is important to collect norms for different age-groups and lev- 
els of hearing impairment in combination, because there is also 
the possibility of more central, or cognitive side-effects of age and 
impairment. 

Second, individuals with low WMC seem to be initially 
susceptible to signal processing distortions from "aggressive" 



signal processing (fast amplitude compression, severe frequency 
compression, binary masking), although this susceptibility may 
decline after a period of familiarization (Rudner et al., 2011). For 
all individuals, concrete options are at hand for manipulations of 
the signal in the hearing aid: to increase amplification, alter input 
dynamics, to remove some information (= noise reduction) to 
get a benefit. But these manipulations come at a cost that is dif- 
ferent for different individuals. Thus, we advocate that the "dose 
of the medicine" (= the active ingredient), the intended benefit of 
signal processing and its side-effects (by-product of the medicine) 
must be tailored to the individual, such that the high WMC can 
have a more active ingredient (= more aggressive signal process- 
ing, compared to the low WMC who may be more susceptible to 
side-effects). This reasoning could in principle also be applied to 
acoustic design of other technologies. 

Third, the data we have presented suggest that many kinds of 
more advanced signal processing in hearing instruments demand 
WMC. The down-side of using advanced signal processing on a 
daily basis is that it demands effort and for any given individual 
with hearing loss, this may outweigh the benefit. Therefore, there 
is a need to develop new methods that assess effortful brain-work 
with more precision. Here, reaction time measures, pupil dila- 
tion indices or measures of evoked response potentials may prove 
to be useful signals for on-line adjustment of signal processing 
parameters in hearing instruments. 

Fourth, with a new cognitive hearing science perspective, it 
would be equally important to evaluate memory and compre- 
hension of the contents of a conversation in noise, as functional 
outcome measures, rather than only focusing on word recog- 
nition accuracy per se (Pichora-Fuller, 2007; Ronnberg et al, 
201 la). This can actually be seen as an indirect measure of cogni- 
tive spare capacity, or the residual cognitive capacity that remains 
once successful listening has taken place (Pichora-Fuller, 2007, 
2013;Mishra etal, 2013). 

Fifth, it is quite possible that to properly evaluate the effects 
of hearing aids and other interventions, a longitudinal study 
that also systematically manipulated the kind of signal processing 
would be quite informative. We know very little about the long- 
term effects of signal processing on cognition and how this may 
relate to or reduce the risk of dementia (Lin et al., 2011, 2013). 

Finally, an intervention study that evaluated the effects of WM 
training on speech in noise understanding would put the causal 
nature of WMC to the test (cf. McNab et al., 2009). Additionally, 
if one could study the neural correlates of this putative plastic 
change that would shed further light on the neural mechanisms 
involved. 
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